This is the original data set from the company where I am working. As I am working as a Data Analyst intern in an event management industry, most of the time I have to check which participants have been activated their profile till the event date. If the any of the participant did not activate their profile, I confirm my colleagues who are working with the event organisers so that they can send a notification to the participants to activate their profile. As this is the original data set so for security purpose of the company I can't upload the data. But I can tell that I have worked only with the participant id, registration date, registration time, invitation time, participant type, registration state columns.
As I want to visualize the graph for the unique number of active participants according to their registration date and also unique number of participants accorto their invitation date so I have made four functions.
Let's jump to the work!
#import librarires
import nbformat
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime as dt
# Define the function. I am giving the parameters as the dataset and the columns which I needed.
# Function for all types of active participants
def all_active_participant(filename, column1, column2, column3, column4):
data=pd.read_excel(filename) # read the dataset
data[[column1,column2]]=data[[column1,column2]].apply(pd.to_datetime) # convert the date to the dat time structure in python
df=data[data[column4]=='Active'] # select the data only for the active status of the registration state
df=df.loc[:,[column3,column1]] # loc the the dataset on 'id' and 'registration date' column
df=df[column1].value_counts().sort_index().rename_axis('registration_date').reset_index(name='unique_count') # count unique number of participants using 'id' column
df['cum_reg']=df['unique_count'].cumsum() # in the new dataframe add a new column of cumulative value for the unique number of participants value
return df # return the new dataframe
# Function for the all types of invited participants
def all_invited_participant(filename, column1, column2, column3, column4):
data=pd.read_excel(filename)
data[[column1,column2]]=data[[column1,column2]].apply(pd.to_datetime)
df_inv=data.loc[:,[column2, column1]]
df_inv=df_inv[column2].value_counts().sort_index().rename_axis('invitation_date').reset_index(name='unique_count1')
df_inv['cum_inv']=df_inv['unique_count1'].cumsum()
return df_inv
a=all_active_participant('talentboost.xlsx', 'Registration Date', 'Invitation date', 'id', 'registrationState')
b=all_invited_participant('talentboost.xlsx', 'Registration Date', 'Invitation date', 'id', 'registrationState')
print(a)
print(b)
registration_date unique_count cum_reg 0 2021-10-11 2 2 1 2021-10-12 2 4 2 2021-10-14 1 5 3 2021-10-18 46 51 4 2021-10-19 44 95 5 2021-10-20 26 121 6 2021-10-21 35 156 7 2021-10-22 14 170 8 2021-10-23 3 173 9 2021-10-24 3 176 10 2021-10-25 36 212 11 2021-10-26 34 246 12 2021-10-27 65 311 13 2021-10-28 104 415 14 2021-10-29 1 416 15 2021-11-01 1 417 invitation_date unique_count1 cum_inv 0 2021-10-11 420 420 1 2021-10-12 41 461 2 2021-10-13 20 481 3 2021-10-14 27 508 4 2021-10-15 33 541 5 2021-10-16 1 542 6 2021-10-17 1 543 7 2021-10-18 1 544 8 2021-10-19 1 545 9 2021-10-20 1 546 10 2021-10-21 1 547 11 2021-10-22 1 548 12 2021-10-23 1 549 13 2021-10-24 1 550 14 2021-10-25 1 551 15 2021-10-26 1 552 16 2021-10-27 1 553 17 2021-10-28 1 554 18 2021-10-29 10 564
# Function for individual participants for active status
def individual_active_participant(filename, column1, column2, column3, column4, participantType):
data=pd.read_excel(filename) # read the dataset
data[[column1,column2]]=data[[column1,column2]].apply(pd.to_datetime)
df1= data[(data.participantType==participantType) & (data.registrationState=='Active')]
df1=df1.loc[:,[column3,column1]]
df1=df1[column1].value_counts().sort_index().rename_axis('reg_date').reset_index(name='unique_value')
df1['cumulative_value']=df1['unique_value'].cumsum() # create a new column to the dataframe and calculate the cumulative value from the unique number of the participants column
return df1
# Function for individual participants for invitation date
def individual_invited_participant(filename, column1, column2, column3, column4, participantType):
data=pd.read_excel(filename) # read the dataset
data[[column1,column2]]=data[[column1,column2]].apply(pd.to_datetime)
df1_inv= data[(data.participantType==participantType)]
df1_inv=df1_inv.loc[:,[column3,column2]]
df1_inv=df1_inv[column2].value_counts().sort_index().rename_axis('invitation_date').reset_index(name='unique_value1')
df1_inv['cum']=df1_inv['unique_value1'].cumsum() # create a new column to the dataframe and calculate the cumulative value from the unique number of the participants column
return df1_inv
# In this data set set, there are there types of participants such as Job seeker, recruiter and talent boost service provider. So, I am call the functions three times for the different participants for the the registration and invitation date.
c_participant=individual_active_participant('talentboost.xlsx','Registration Date', 'Invitation date', 'id', 'registrationState', 'Job seeker')
d_participant=individual_invited_participant('talentboost.xlsx','Registration Date', 'Invitation date', 'id', 'registrationState', 'Job seeker')
e_participant=individual_active_participant('talentboost.xlsx','Registration Date', 'Invitation date', 'id', 'registrationState', 'Recruiter')
f_participant=individual_invited_participant('talentboost.xlsx','Registration Date', 'Invitation date', 'id', 'registrationState', 'Recruiter')
g_participant=individual_active_participant('talentboost.xlsx','Registration Date', 'Invitation date', 'id', 'registrationState', 'Talent Boost service provider')
h_participant=individual_invited_participant('talentboost.xlsx','Registration Date', 'Invitation date', 'id', 'registrationState', 'Talent Boost service provider')
print(c_participant)
print(d_participant)
print(e_participant)
print(f_participant)
print(g_participant)
print(h_participant)
reg_date unique_value cumulative_value
0 2021-10-11 1 1
1 2021-10-18 33 34
2 2021-10-19 32 66
3 2021-10-20 15 81
4 2021-10-21 10 91
5 2021-10-22 8 99
6 2021-10-23 2 101
7 2021-10-24 3 104
8 2021-10-25 10 114
9 2021-10-26 12 126
10 2021-10-27 42 168
11 2021-10-28 38 206
invitation_date unique_value1 cum
0 2021-10-11 206 206
1 2021-10-12 28 234
2 2021-10-13 15 249
3 2021-10-14 17 266
4 2021-10-15 25 291
5 2021-10-16 1 292
6 2021-10-17 1 293
7 2021-10-18 1 294
8 2021-10-19 1 295
9 2021-10-20 1 296
10 2021-10-21 1 297
11 2021-10-22 1 298
12 2021-10-25 1 299
13 2021-10-27 1 300
14 2021-10-28 1 301
15 2021-10-29 5 306
reg_date unique_value cumulative_value
0 2021-10-11 1 1
1 2021-10-18 3 4
2 2021-10-19 2 6
3 2021-10-20 3 9
4 2021-10-21 8 17
5 2021-10-22 2 19
6 2021-10-23 1 20
7 2021-10-25 5 25
8 2021-10-26 7 32
9 2021-10-27 4 36
10 2021-10-28 7 43
invitation_date unique_value1 cum
0 2021-10-11 43 43
1 2021-10-14 1 44
2 2021-10-15 2 46
3 2021-10-26 1 47
4 2021-10-29 1 48
reg_date unique_value cumulative_value
0 2021-10-18 10 10
1 2021-10-19 7 17
2 2021-10-20 7 24
3 2021-10-21 16 40
4 2021-10-22 4 44
5 2021-10-25 19 63
6 2021-10-26 14 77
7 2021-10-27 17 94
8 2021-10-28 57 151
9 2021-11-01 1 152
invitation_date unique_value1 cum
0 2021-10-11 152 152
1 2021-10-12 4 156
2 2021-10-13 3 159
3 2021-10-14 2 161
4 2021-10-15 6 167
5 2021-10-24 1 168
6 2021-10-29 1 169
# Graph for the all types of participants
fig = go.Figure()
fig.add_trace(go.Scatter(x=a['registration_date'], y=a['cum_reg'], legendgroup='group',legendgrouptitle_text='All participant type', name='Cumulative graph', mode='lines+markers', marker=dict(color='crimson', size=6)))
fig.add_trace(go.Scatter(x=b['invitation_date'], y=b['cum_inv'], legendgroup='group',name='Invitation date Cumulative graph', mode='lines+markers', marker=dict(color='#bcbd22', size=6)))
fig.add_trace(go.Bar(x=a["registration_date"], y=a["unique_count"],legendgroup='group',name='Total registered id on unique date'))
# Graph for the individual participants
# In this dataset, there are 3 types of participants.So I am making the graph for the three types of participants
#Job seeker
fig.add_trace(go.Scatter(x=c_participant['reg_date'], y=c_participant['cumulative_value'], legendgroup='group2',legendgrouptitle_text='Participant type-Job seeker', name='Cumulative graph', mode='lines+markers', marker=dict(color='MediumPurple', size=6)))
fig.add_trace(go.Scatter(x=d_participant['invitation_date'], y=d_participant['cum'], legendgroup='group2',name='Invitation date Cumulative graph for job seeker', mode='lines+markers', marker=dict(color='#e377c2', size=6)))
fig.add_trace(go.Bar(x=c_participant["reg_date"], y=c_participant["unique_value"], legendgroup='group2', name='Total registerd id on unique date'))
# For recruiter
fig.add_trace(go.Scatter(x=e_participant['reg_date'], y=e_participant['cumulative_value'], legendgroup='group3',legendgrouptitle_text='Participant type-Recruiter', name='Cumulative graph', mode='lines+markers', marker=dict(color='chartreuse', size=6)))
fig.add_trace(go.Scatter(x=f_participant['invitation_date'], y=f_participant['cum'], legendgroup='group3',name='Invitation date Cumulative graph for recruiter', mode='lines+markers', marker=dict(color='#2ca02c', size=6)))
fig.add_trace(go.Bar(x=e_participant["reg_date"], y=e_participant["unique_value"], legendgroup='group3', name='Total registerd id on unique date',marker=dict(color='chartreuse')))
#For service provider
fig.add_trace(go.Scatter(x=e_participant['reg_date'], y=e_participant['cumulative_value'], legendgroup='group4',legendgrouptitle_text='Participant ype-Talent boost service provider', name='Cumulative graph', mode='lines+markers', marker=dict(color='darkblue', size=6)))
fig.add_trace(go.Scatter(x=f_participant['invitation_date'], y=f_participant['cum'], legendgroup='group4',name='Invitation date Cumulative graph for service provider', mode='lines+markers', marker=dict(color='#2ca02c', size=6)))
fig.add_trace(go.Bar(x=e_participant["reg_date"], y=e_participant["unique_value"], legendgroup='group4', name='Total registerd id on unique date',marker=dict(color='darkblue')))
fig.update_xaxes(title='Registration Date',visible=True, showticklabels=True)
fig.update_yaxes(title='Unique Registration/Day',visible=True, showticklabels=True)
fig.update_layout(xaxis_range=[a['registration_date'].min(),'2021-11-02'])
fig.update_xaxes(rangeslider_visible=True)
fig.update_layout(
title={
'text': "Aggregative active registration per day & the invitation per day",
'y':.95,
'x':0.4,
'xanchor': 'center',
'yanchor': 'top'})
fig.show()
The main task was to visualize the aggregative curve for the unique oarticipants on the unique date. I have tried to make it work for all the similar data set. We will give the structure of the column name to the developers and tell them to capture the raw data like this and it will work.When the management sector will see the graph before the event, they will be able to check are the participants activating their profile or not, how the rate of activation is growing comparing the invitation date. For example, from the job seeker graph, it can be illustarted that the on 27th October, the highest number of the job seekers (42) have been activated their profile and till October 28, total activated job seekers were 206 where as total 306 invitation had been sent to the job seekers till October 29. May be for some reason, some job seeker did not activate their profile. The reason could be maybe they forgot and did not get the emailor they have lost link for activating it etc. When they will see that activation rate is low,they can tell the event organizer for campaigining their event and send an email to the participants who doesn't activate their profile. If one wants to see only the one kind of participants, she/he can only click the any group of the legend; it will show the graph only for the selected one.
Thank you!